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Abstract 

The aim of this note is to provide a short and self-contained proof of Hormander's 
theorem about the smoothness of transition probabilities for a diffusion under 
Hormander's "brackets condition". While both the result and the technique of 
proof are well-known, the exposition given here is novel in two aspects. First, we 
introduce Malliavin calculus in an "intuitive" way, without using Wiener's chaos 
decomposition. While this may make it difficult to prove some of the standard 
results in Malliavin calculus (boundedness of the derivative operator in spaces 
for example), we are able to bypass these and to replace them by weaker results 
that are still sufficient for our purpose. Second, we introduce a notion of "almost 
implication" and "almost truth" (somewhat similar to what is done in fuzzy logic) 
which allows, once the foundations of Malliavin calculus are laid out, to give a 
very short and streamlined proof of Hormader's theorem that focuses on the main 
ideas without clouding it by technical details. 

Dedicated to the memory of Paul Malliavin. 

1 Introduction 

One of the main tools in many results on the convergence to equilibiium of Markov 
processes is the presence of some form of "smoothing" for the semigroup. For 
example, if a Markov operator V over a Polish space X possesses the strong Feller 
property (namely it maps Bi^iX), the space of bounded measurable functions into 
Ch{X), the space of bounded continuous functions), then one can conclude that 
any two ergodic invariant measures for V must either coincide or have disjoint 
topological supports. Since the latter can often been ruled out by some form of 
controllability argument, we see how the strong Feller property is the basis for 
many proofs of ergodicity. 

It is then desirable to have criteria that are as simple to formulate as possible 
and that ensure that the Markov semigroup associated to a given Markov process 
has some smoothing property. One of the most natural classes of Markov processes 
are given by diffusion processes and this will be the object of study in this note. Our 
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main object of study is a stochastic differential equation of the form 



m 



dx = Voix) dt + Y^ Viix) o dWi , 



(1.1) 



where the V^'s are smooth vector fields on R" and the Wi's are independent stan- 
dard Wiener processes. In order to keep all arguments as straightforward as pos- 
sible, we will assume throughout this note that these vector fields assume the co- 
ercivity assumptions necessary so that the solution flow to (1.1) is smooth with 
respect to its initial condition and that all of its derivatives have moments of all 
orders. This is satisfied for example if the Vi's are C°° with bounded derivatives of 
all orders. 

Remark 1.1 We wrote (1.1) as a Stratonowich equation on purpose. This is for 
two reasons: at a pragmatic level, this is the "correct" formulation which allows to 
give a clean statement of Hormander's theorem (see Definition 1.2 below). At the 
intuitive level, the question of smoothness of transition probabilities is related to 
that of the extent of their support. The Stroock-Varadhan support theorem [SV72] 
chai^acterises this as consisting precisely of the closure of the set of points that can 
be reached if the Wiener processes Wi in (1.1) ai^e replaced by arbitrary smooth 
control functions. This would not be true in general for the Ito formulation. 

It is well-known that if the equation (1.1) is elliptic namely if, for every point 
x G R", the linear span of {Vi(x)}^^ is all of R", then the law of the solution 
to (1.1) has a smooth density with respect to Lebesgue measure. Furthermore, the 
corresponding Markov semigroup Vt defined by 



is so that Vtip is smooth, even if (p is only bounded measurable. (Think of the 
solution to the heat equation, which corresponding to the simplest case where Vq = 
and the Vi form an orthonormal basis of R".) In practice however, one would 
like to obtain a criterion that also applies to some equations where the ellipticity 
assumption fails. For example, a very well-studied model of equilibrium statistical 
mechanics is given by the Langevin equation: 



where T > should be interpreted as a temperature, V : R" — )■ R+ is a sufficiently 
coercive potential function, and W is an n-dimensional Wiener process. Since 
solutions to this equation take values in R^" (both p and q are n-dimensional), 
this is definitely not an elliptic equation. At an intuitive level however, one would 
expect it to have some smoothing properties: smoothing reflects the spreading of 
our uncertainty about the position of the solution and the uncertainty on p due 
to the presence of the noise teims gets instantly transmitted to q via the equation 
dq = p dt. 



Vtfixo) = E^o(^(xt) , 



dq = pdt , 



dp 



VV{q) dt-pdt + \/2T dW{t) , 
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In a seminal paper [Hor67], Homiander was the first to formulate the "correct" 
non-degeneracy condition ensuring that solutions to (1.1) have a smoothing effect. 
To describe this non-degeneracy condition, recall that the Lie bracket [U, V] be- 
tween two vector fields U and V on R" is the vector field defined by 

[U, V]{x) = DV{x) U{x) - DU{x) V{x) , 

where we denote by DU the derivative matrix given by {DU)ij = djUi. This 
notation is consistent with the usual notation for the commutator between two lin- 
ear operators since, if we denote by Afj the first-order differential operator acting 
on smooth functions / by Auf{x) = {U{x), V/(x)), then we have the identity 
A[u,v] = [Au,Avl 

With this notation at hand, we give the following definition: 

Definition 1.2 Given an SDE (1.1), define a collection of vector fields Vfc by 

Yo = {Vi : i>0} , Vfc+i = Vfe U {[[/, Vj] : [/ G Vfc & i > 0} . 

We also define the vector spaces Vfc(x) = span{F(x) : V G V^}. We say that 
(1.1) satisfies the parabolic Hormander condition if Ufc>i Vfc(x) = R" for every 
X E R". 

With these notations, Hormander's theorem can be formulated as 

Theorem 1.3 Consider (1.1) and assume that all vector fields have bounded deriva- 
tives of all orders. If it satisfies the parabolic Hormander condition, then its solu- 
tions admit a smooth density with respect to Lebesgue measure and the correspond- 
ing Markov semigroup maps bounded functions into smooth functions. 

Hormander's original proof was formulated in terms of second-order differen- 
tial operators and was purely analytical in nature. Since the main motivation on 
the other hand was probabilistic and since, as we will see below, Hormander's 
condition can be understood at the level of properties of the trajectories of (1.1), 
a more stochastic proof involving the original stochastic differential equation was 
sought for. The breakthrough came with Malliavin's seminal work [Mal78], where 
he laid the foundations of what is now known as the "Malliavin calculus", a dif- 
ferential calculus in Wiener space and used it to give a probabilistic proof of 
Hormander's theorem. This new approach proved to be extremely successful and 
soon a number of authors studied variants and simplifications of the original proof 
[BisSlb, BisSla, KS84, KS85, KS87, Nor86]. Even now, more than three decades 
after Malliavin's original work, his techniques prove to be sufficiently flexible to 
obtain related results for a number of extensions of the original problem, including 
for example SDEs with jumps [Tak02, IK06, Cas09, TaklO], infinite-dimensional 
systems [Oco88, BT05, MP06, HM06, HMU], and SDEs driven by Gaussian pro- 
cesses other than Brownian motion [BH07, CFIO, HPl 1]. 
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A complete rigorous proof of Theorem 1.3 goes somewhat beyond the scope of 
these notes. However, we hope to be able to give a convincing argument showing 
why this result is true and what are the main steps involved in its probabilistic proof. 
The aim in writing these notes was to be sufficiently self-contained so that a strong 
PhD student interested in stochastic analysis would be able to fill in the missing 
gaps without requiring additional ideas. The interested reader can find the technical 
details required to make the proof rigorous in [Mal78, KS84, KS85, KS87, Nor86, 
Nua95]. Hormander's original, completely different, proof using fractional inte- 
grations can be found in [H6r67]. A yet completely different functional-analytic 
proof using the theory of pseudo-differential operators was developed by Kohn in 
[Koh78] and can also be found in [Hor85] or, in a slightly different context, in the 
recent book [HN05]. 

The remainder of these notes is organised as follows. First, in Section 2 below, 
we will show why it is natural that the iterated Lie brackets appear in Hormander's 
condition. Then, in Section 3, we will give an introduction to Malliavin calculus, 
including in particular its integration by parts formula in Wiener space. Finally, in 
Section 4, we apply these tools to the particular case of smooth diffusion processes 
in order to give a probabilistic proof of Hormander's theorem. 
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2 Why is it the correct condition? 

At first sight, the condition given in Definition 1.2 might seem a bit strange. In- 
deed, the vector field Vq is treated differently from all the others: it appears in the 
recursive definition of the V^, but not in Vq. This can be understood in the fol- 
lowing way: consider trajectories of (1.1) as curves in space-time. By the Stroock- 
Vai^adhan support theorem [SV72], the law of the solution to (1.1) on pathspace 
is supported by the closure of those smooth curves that, at every point (x,t), are 
tangent to the hyperplane spanned by {Vq, . . . , Vm}, where we set 



With this notation at hand, we could define Vfc as in Definition 1.2, but with Vq = 
{Vq, . . . , Vrn}- Then, it is easy to check that Hormander's condition is equivalent 
to the condition that Ufe>i Vfc = R"+^ for every (x, t) € R"+^ 
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This condition however has a simple geometric interpretation. For a smooth 
manifold Ai , recall that E C TA4 is a smooth subbundle of dimension dif Ex C 
TxM. is a vector space of dimension d at every x ^ M and if the dependency 
X ^ Exi& smooth. (Locally, Ex is the linear- span of finitely many smooth vector 
fields on M.) A subbundle is called integrable if, whenever [/, V are vector fields 
on A4 taking values in E, their Lie bracket [[/, F] also takes values in E. 

With these definitions at hand, recall the well-known Frobenius integrability 
theorem from differential geometry: 

Theorem 2.1 Let Ai be a smooth n-dimensional manifold and let E C TM be a 
smooth vector bundle of dimension d < n. Then E is integrable if and only if there 
(locally) exists a smooth foliation of M into leaves of dimension d such that, for 
every x G M, the tangent space of the leaf passing through x is given by Ex- 

In view of this result, Hormander's condition is not surprising. Indeed, if we 
define E(^x,t) = y]k>o^k{x,t), then this gives us a subbundle of R"^^ which is 
integrable by construction of the V^. Note that the dimension of E{x,t) could in 
principle depend on (x, t), but since the dimension is a lower semicontinuous func- 
tion, it will take its maximal value on an open set. If, on some open set, this 
maximal value is less than n + 1, then Theorem 2.1 tells us that, there exists a 
submanifold (with boundary) M. <Z M. oi dimension strictly less than n such that 
T{y^s)-M = E(y^s) for every (y, s) G A^. In particular, all the curves appearing in the 
Stroock-Varadhan support theorem and supporting the law of the solution to (1.1) 
must lie in M. until they reach its boundary. As a consequence, since M is always 
transverse to the sections with constant t, the solutions at time t will, with positive 
probability, lie in a submanifold of A4 of strictly positive codimension. This imme- 
diately implies that the transition probabilities cannot be continuous with respect 
to Lebesgue measure. 

To summarise, if Hormander's condition fails on an open set, then transition 
probabilities cannot have a density with respect to Lebesgue measure, thus showing 
that Hormander's condition is "almost necessary" for the existence of densities. 
The hard part of course is to show that it is a sufficient condition. Intuitively, the 
reason is that Hormander's condition allows the solution to (1.1) to "move in all 
directions". Why this is so can be seen from the following interpretation of the Lie 
brackets. Set 



1 9 

Unit) = — cos(n t) , 



i 9 

Vn{t) = — srn(n t) , 



n 



n 



and consider the solution to 



X = U{x)Un{t) + V{x)Vn{t) . 



(2.1) 



We claim that, as n 



oo, this converges to the solution to 

u = \vU,V^{x) . 



(2.2) 
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This can be seen as follows. If we integrate (2.1) over a short time interval, we 
have the first order approximation 

x{h) « X^^\h) = Xo + U{Xo)Un{h) + V{Xo)Vn{h) , 

which simply converges to as n — oo. To second order, however, we have 

rh 

x{h)^Xo+ / {U{x^^^)Un + Vix^^^)Vn)dt 

Jo 

I'h 

« X^^\h) + / {DU{Xq) iln + DV{Xo) Vn){U{Xo)Un + V{Xo)Vn) dt 

Jo 

,.h 

« Xo + / {DU{XQ)V{XQ)VnUn + DV(Xo)U{Xo)UnVn) dt . 

Jo 

Here, we used the fact that the integral of n„tt„ (and similarly for VnVn) is given 
by and therefore converges to as n — 0. Note now that over a period, 
Vn(t)un{t) averages to — ^ and Un(t)vn(t) averages to ^, thus showing that one 
does indeed obtain (2.2) in the limit. This reasoning shows that, by combining 
motions in the directions U and V, it is possible to approximate, to within arbitrary 
accuracy, motion in the direction [U, V]. 

A similar reasoning shows that if we consider 

± = Uix) + V{x)Vnit) , 

then, to lowest order in 1/n, we obtain that as n — )• oo, x follows 

X « U{x) + —[U,V]{x) . 
2n 

Combining these interpretations of the meaning of Lie brackets with the Stroock- 
Varadhan support theorem, it suggests that, if Homiander's condition holds, then 
the support of the law of xt will contain an open set around the solution at time t 
to the deterministic system 

X = Vo{x) , x(0) = Xo . 

This should at least render it plausible that under these conditions, the law of xj has 
a density with respect to Lebesgue measure. The aim of this note is to demonstrate 
how to turn this heuristic into a mathematical theorem with, hopefully, a minimum 
amount of effort. 

Remark 2.2 While Homander's condition implies that the control system associ- 
ated to (1.1) reaches an open set around the solution to the deterministic equation 
X = Vo(x), it does not imply in general that it can reach an open set around xo. In 
particular, it is not true that the parabolic Hormander condition implies that (1.1) 
can reach every open set. A standard counterexample is given by 

dx = — sin(x) dt + cos(x) o dW{t) , xo = , 

which satisfies Hormander's condition but can never exit the interval [— 7r/2, 7r/2]. 
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3 An Introduction to Malliavin calculus 

In this section, we collect a number of tools that will be needed in the proof. The 
main tool is the integration by parts formula from Malliavin calculus, as well of 
course as Malliavin calculus itself. 

The main tool in the proof is the Malliavin calculus with its integration by 
part formula in Wiener space, which was developed precisely in order to provide a 
probabilistic proof of Theorem 1.3. It essentially relies on the fact that the image 
of a Gaussian measure under a smooth submersion that is sufficiently integrable 
possesses a smooth density with respect to Lebesgue measure. This can be shown 
in the following way. First, one observes the following fact: 

Lemma 3.1 Let ^be a probability measure on R" such that the bound 

holds for every smooth bounded function G and every k > 1. Then fi has a smooth 
density with respect to Lebesgue measure. □ 

Proof. Let s > n/2 so that C Cb by Sobolev embedding. By duality, the 
assumption then implies that every distributional derivative of jj. belongs to the 
Sobolev space so that fi belongs to for every £ G R. The result then 

follows from the fact that C C'^ as soon as i > k + ^. □ 

Consider now a sequence of N independent Gaussian random variables Sw^ 
with variances 5tk for k G {1, . . . , N}, as well as a smooth map X : R^ — )• R". 
We also denote by w the collection {Swk}k>i and we define the n x n matrix- 
valued map 

= dkX,{w)dkXj{w) 6tk , (3.1) 

k 

where we use dk as a shorthand for the partial derivative with respect to the vari- 
able Swk. With this notation, X being a submersion is equivalent to M{w) being 
invertible for every w. 

Before we proceed, let us introduce additional notation, which hints at the fact 
that one would really like to interpret the 5wk as the increments of a Wiener process 
of an interval of length 51^. When considering a family {Fk}^^i of maps from 
R^ — )• R", we identify it with a continuous family {-Ft}t>o, where 

Ft = Fk, t€[tk,tk+i), tk = Y,6ti. (3.2) 

e<k 

Note that with this convention, we have Iq = 0, ti = 6ti, etc. This is of course 
an abuse of notation since Ft is not equal to for t = k, but we hope that it will 
always be clear from the context whether the index is a discrete or a continuous 
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variable. We also set Ft = for t > ■ With this notation, we have the natural 
identity 



N 

Ftdt = Y,Fk 6tk . 



k=l 
N 



Furthermore, given a smooth map G : R — )• R, we will from now on denote by 
DtG the family of maps such that D^G = dkG for t G [t^, tfc+i)> so that (3.1) can 
be rewritten as 



Mijiw) = j DtXi{w)T)tXj{w)dt . 



The quantity D^G is called the Malliavin derivative of the random variable G. 

The main feature of the Malliavin derivative operator suggesting that one 
expects it to be well-posed in the limit — )• oo is that it was set up in such a 
way that it is invariant under refinement of the mesh {5tk} in the following way. 
For every k, set 5wk = + Sw'l , where 5w^ are independent Gaussians with 
variances 5t^ with 5t~^ + 6t^ = 6tk and then identify maps G : R^ — )■ R with a 
map G: R^^ ^ Rby 

G{5wf, . . . , dw^) = G(6wi + 5wf , . . . , Swjj- + Sw'^) . 

Then, for every t > 0, DtG is precisely the map identified with DtG. 
With all of these notations at hand, we then have the following result: 

Theorem 3.2 Let X:R^ ^ R be smooth, assume that Miw) is invertible for 
every w and that, for every p > 1 and every m > 0, we have 

E\dk, ■ ■ ■ dk,^X{w)\P < oo , E\\M{wr^f <oo . (3.3) 

Then the law of X{w) has a smooth density with respect to Lebesgue measure. 
Furthermore, the derivatives of the law ofX can be bounded from above by expres- 
sions that depend only on the bounds (3.3), but are independent of N, provided 
that ^ 5tk = T remains fixed. 

Besides Lemma 3.1, the main ingredient of the proof of Theorem 3.2 is the 
following integration by parts formula which lies at the heart of the success of 
Malliavin calculus. If F^ and G are square integrable functions with square inte- 
grable derivatives, then we have the identity 



e( j DtG{w) Ftiw) dt)=EY^ dkG{w)Fk{w) Sh 



= EG{w) Fkiw) 5wk - EG{w) ^ dkFkiw) dt^ 

k k 

= e(g{w) j Ftdwit)^ , (3.4) 

where we defined the Skorokhod integral f Ft dw(t) by the expression on the sec- 
ond line. Note that in order to obtain (3.4), we only integrated by parts with respect 
to the variables 6wk. 
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Remark 3.3 The Skorokhod integral is really an extension of the usual Ito integral, 
which is the justification for our notation. This is because, if Ft is an adapted 
process, then F^^. is independent of 6we for i > kby definition. As a consequence, 
the term dkFk drops and we are reduced to the usual Ito integral. 

Remark 3.4 It follows immediately from the definition that one has the identity 
Bt [ Fs dw{s) = Ft+ [ BtFs dw{s) . (3.5) 



Formally, one can think of this identity as being derived from the Leibnitz rule, 
combined with the identity Dt{dw(s)) = 6{t — s) ds, which is a kind of continuous 
analogue of the trivial discrete identity 8^5 = d^e- 

This Skorokhod integral satisfies the following extension of Ito's isometry: 

Proposition 3.5 Let Fk be square integrable functions with square integrable deriva- 
tives, then 



Ftdwit)^ = ^ j F^{w)dt + E j j BtFs{w)BtFs{w)ds 

; j j \BtFs{w)\'^ ds dt, 



dt 



<E j Fliw) dt + E 
holds. 

Proof. It follows from the definition that one has the identity 

E ( / Ft dw{t)) ' = ^{FkFi 5wk5wi + duFud^F^ 5tk5t^ - 2FkdiF^ dwkSt^) . 

Applying the identity EG dwi = EdiGSti to the first term in the above formula 
(with G = FkFi 5wk), we thus obtain 

■ ■■ = Y ^{FkFeSk,e6te + dkFu deFe 5tk6te + (FedeFk - Fud^Fd 6wk6ti) . 

Applying the same identity to the last term then finally leads to 

■ ■■ = Y1 ^iFkFi6k,i6te + dkFi d^Fk 5tk5ti) , 
k,e 

which is precisely the desired result. □ 
As a consequence, we have the following: 
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Proposition 3.6 Assume that ^ 5tk = T < oo. Then, for every p > there exists 
C > and k > such that the bound 



E 



Fg dw(s) 



|2p 



0<£<k 



to,...,ti 



holds. Here, C may depend on T and p, but k depends only on p. 

Proof. Since the case p < 2 follows from Proposition 3.5, we can assume with- 
out loss of generality that p > 2. Combining (3.4) with (3.5) and then applying 
Holder's inequality, we have 



E 



Fs dw{s) 



ip -1)E 



Fg dw(s) 



p~2 



Ft Ft+ / DtFsdwis)] dt 







< 






2 






< 


Ie 




2 



Fg dw(s) 
Fg dw(s) 



+ cE 



+ cE 



Ft + / DtFg dw(s) 



dt + cE / iFtl'^Pdt 



DtFg dwis) 



dt + cE il + \Ft\fPdt. 



where c is some constant depending on p and T that changes from line to line. The 
claim now follows by induction. □ 

Remark 3.7 The bound in Proposition 3.6 is clearly very far from optimal. Actu- 
ally, it is known that, for every p > I, there exists C such that 



E 



Fg dwis) 



2p 



< CE 



F^ds 



+ CE 



iBtFgl^ds dt 



even if T = oo. However, this extension of the Burkholder-Davies-Gundy inequal- 
ity requires highly non-trivial harmonic analysis and, to best of the author's knowl- 
edge, cannot be reduced to a short elementary calculation. The reader interested in 
knowing more can find its proof in [Nua95, Ch. 1.3-1.5]. 

The proof of Theorem 3.2 is now straightfoward: 

Proof of Theorem 3.2. We want to show that Lemma 3.1 can be applied. For r] G 
R", we then have from the definition of M the identity 

{DjG){X{w)) = Y dk{GiXiw)))dkXmiw) 5tk M-]iw) . (3.6) 

k,m 

Combining this identity with (3.4), it follows that 

EDjG{X) = E (g{X{w)) Y [ ^tXmiw) M-}(t/;) dw{t)^ . (3.7) 

Note that, by the chain rule, one has the identity 



-M"i(DjM)M~i , 
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and similarly for higher order derivatives, so that the Malliavin derivatives of M~ 
can be bounded by terms involving and the Malliavin derivatives of X. 

Combining this with Proposition 3.5 and (3.3) immediately shows that the re- 
quested result holds for A; = 1. Higher values of k can be treated by induction by 
repeatedly applying (3.6). This will lead to expressions of the type (3.7), with the 
right hand side consisting of multiple Skorokhod integrals of higher order polyno- 
mials in and derivatives of X. 

By Proposition 3.6, the moments of each of the terms appearing in this way can 
be bounded by finitely many of the expressions appearing in the assumption so that 
the required statement follows. □ 

4 Application to Diffusion Processes 

We are now almost ready to tackle the proof of Hormander's theorem. Before we 
start, we discuss how D^Xt can be computed when Xt is the solution to an SDE of 
the type (1.1) and we use this discussion to formulate precise assumption for our 
theorem. 

4.1 Malliavin Calculus for Diffusion Processes 

By taking the limit N ^ oo and 6tk — )■ with ^ 6tk = 1, the results in the pre- 
vious section show that one can define a "Malliavin derivative" operator D, acting 
on a suitable class of "smooth" random variables and returning a stochastic pro- 
cess that has all the usual properties of a derivative. Let us see how it acts on the 
solution to an SDE of the type (1.1). 

An important tool for our analysis will be the Unearisation of (1.1) with respect 
to its initial condition. Denote by the (random) solution map to (1.1), so that 
Xt = ^ti^o)- It is then known that, under Assumption 4.2 below, $t is almost 
surely a smooth map for every t. We actually obtain a flow of smooth maps, namely 
a two-parameter family of maps ^s,t such that xt = ^s,tixs) for every s < t and 
such that ^t^u o ^s,t = ^s,u and <I>t = $o,i- For a given initial condition xq, we 
then denote by Jg^t the derivative of ^s,t evaluated at Xg- Note that the chain rule 
immediately implies that one has the composition law Jg^u = Jt,uJs,t, where the 
product is given by simple matrix multiplication. We also use the notation J^*j' for 
the fcth-order derivative of f 

It is straightforward to obtain an equation governing Jq^i by differentiating both 
sides of (1.1) with respect to xq. This yields the non-autonomous hnear equation 

m 

d Jo,t = DVoixt) Jo,t dt + J2 DV,(xt) Jo,t o dWi(t) , Jo,o = / , (4. 1) 

1=1 

(k) 

where / is the n x n identity matrix. Higher order derivatives Jq / with respect to 
the initial condition can be defined similarly. 

Remark 4.1 For every s > 0, the quantity t solves the same equation as (4.1), 
except for the initial condition which is given by J<j,s = /. 
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On the other hand, we can use (3.5) to, at least on a formal level, take the 
Malliavin derivative of the integral form of (1.1), which then yields for r < t the 
identity 

BiXit) = / DVoiXs) DiX, ds + Y. DVi{Xs) D^,X, o dW,{s) + Vj{Xr) . 

Jr Jr 

(Here we denote by D-' the Malliavin derivative with respect to Wj; the general- 
isation of the discussion of the previous section to the case of finitely many in- 
dependent Wiener processes is straightforward.) We see that, save for the initial 
condition at time t = r given by Vj{Xr), this equation is identical to the integral 
form of (4.1)! 

As a consequence, we have for s < t the identity 

BiXt = Js,tVjiX,) . (4.2) 

Furthermore, since Xt is independent of the later increments of W, we have T)iXt = 
for s > t. 

By the composition property Jo^t = Js,t'Jo,s, we can write < = Jo,tJol, 
which will be useful in the sequel. Here, the inverse JqI of the Jacobian can be 
found by solving the SDE 

m 

dJ^ j = -Jq J DVoix) dt-Y, Jo} DVi{x) o dWi . (4.3) 

i=l 

This follows from the chain rule by noting that if we denote by '^{A) = 

the map that takes the inverse of a square matrix, then we have D'^{A)H = 

-A~^HA~^. 

This discussion is the motivation for the following assumption, which we as- 
sume to be in force from now on: 

Assumption 4.2 The vector fields Vi are C°° and all of their derivatives grow at 
most polynomially at infinity. Furthermore, they are such that the solutions to (1.1), 
(4.1) and (4.3) satisfy 

E sup \ xt\^ < oo , E sup I Jg't* 1^ < oo , E sup | J^l |^ < co , 

t<T t<T ' t<T 

for every initial condition xq G R'^, every terminal time T > 0, every k > 0, and 
every p > 0. 

Remark 4.3 It is well-known that Assumption 4.2 holds if the Vi are bounded 
with bounded derivatives of all orders. However, this is far from being a necessary 
assumption. 
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Remark 4.4 Under Assumption 4.2, standard limiting procedures allow to justify 
(4.2), as well as all the formal manipulations that we will perform in the sequel. 

With these assumptions in place, the version of Hormander's theorem that we 
are going to prove in these notes is as follows: 

Theorem 4.5 Let xq G R" and let xt be the solution to (1.1). If the vector fields 
{Vj} satisfy the parabolic Honnander condition and Assumption 4.2 is satisfied, 
then the law of Xt has a smooth density with respect to Lebesgue measure. 

Proof. Denote by Ao,t the operator Ao^tv = (s) ds, where f is a 

square integrable, not necessarily adapted, R^-valued stochastic process and V 
is the 71 X m matrix-valued function obtained by concatenating the vector fields 
for J = 1, . . . , m. With this notation, it follows from (4.2) that the Malliavin 
covariance matrix Mo,t of Xt is given by 

Mo,t = Ao,tAlt = f Js,tViXs)V*{Xs)Jlt ds . 
Jo 

It follows from (4.2) that the assumptions of Theorem 3.2 are satisfied for the 
random variable Xt, provided that we can show that ||Mq J || has bounded moments 
of all orders. This in turn follows by combining Lemma 4.7 with Theorem 4.8 
below. □ 

4.2 Proof of Hormander's Theorem 

The remainder of this section is devoted to a proof of the fact that Hormander's 
condition is sufficient to guarantee the invertibiUty of the Malliavin matrix of a 
diffusion process. For purely technical reasons, it turns out to be advantageous to 
rewrite the Malliavin matrix as 



where Co,f is the reduced Malliavin matrix of our diffusion process. 

Remark 4.6 The reason for considering the reduced Malliavin matrix is that the 
process appearing under the integral in the definition of Co,* is adapted to the fil- 
tration generated by Wt- This allows us to use some tools from stochastic calculus 
that would not be available otherwise. 

Since we assumed that Jq,* has inverse moments of all orders, the invertibility 
of Mo,t is equivalent to that of Co.*. Note first that since Co.t is a positive definite 
symmetric matrix, the norm of its inverse is given by 





Co-ill = ( 



inf {r],Co^tJ]) 



A very useful observation is then the following: 
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Lemma 4.7 Let M be a symmetric positive semidefinite n x n matrix-valued ran- 
dom variable such that E||Mp < co for every p > 1 and such that, for every 
p>l there exists Cp such that 

sup P((7?, Mr]) <e)< CpE^ , (4.4) 

holds for every e < 1. Then, E||M-^||p < oo for every p > 1. 

Proof. The non-trivial part of the result is that the supremum over r] is taken outside 
of the probability in (4.4). For e > 0, let {r]k}k<N be a sequence of vectors with 
\r]k\ = 1 such that for every r] with \rj\ < 1, there exists k such that |% — ?7| < e^. It 
is clear that one can find such a set with < Ce^~^" for some C > independent 
of e. We then have the bound 

{t], Mifi = {rik,Mi]k) + {r]-r]k, Mrf) + {t] - 7]i„ Mr]k) 
> {7]k,M7]k) -2\\M\\e^ , 

so that 

P(^^inf {r],Mr]) < e) < ^Q^l^iVk, Mrj^) < 4e^ +P(^||M|| > 

< Ce^-^n pf(^^^Mri) < 4e) +pf||M|| > -) . 

\ri\=l ^ ^ ^ 

It now suffices to use (4.4) for p large enough to bound the first term and Cheby- 
chev's inequality combined with the moment bound on || A/|| to bound the second 
term. □ 

As a consequence of this, Theorem 4.5 is a corollary of: 

Theorem 4.8 Consider (1.1) and assume that Assumption 4.2 holds. If the corre- 
sponding vector fields satisfy the parabolic Hormander condition then, for every 
initial condition x € R", we have the bound 

sup P((??,Co,ir/) <e) < CpsP, 

\r,\=l 

for suitable constants Cp and all p > 1. 

Remark 4.9 The choice t = 1 as the final time is of course completely arbitrary. 
Here and in the sequel, we will always consider functions on the time interval 
[0, 1]. 

Before we turn to the proof of this result, we introduce a very useful notation 
which, to the best of the author's knowledge, was first used in [HMll]. Given a 
family A = {A£}eg(o,i] of events depending on some pai^ameter e > 0, we say 
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that A is "almost true" if, for every p > there exists a constant Cp such that 
FiAe) > 1 - CpeP for all e G (0, 1]. Similarly for "almost false". Given two 
such families of events A and B, we say that "A almost implies B" and we write 
A i? if ^4 \ 5 is almost false. It is straightfoiAvard to check that these notions 
behave as expected (almost implication is transitive, finite unions of almost false 
events are almost false, etc). Note also that these notions are unchanged under any 
reparametrisation of the form e i— > for a > 0. Given two families X and Y of 
real-valued random variables, we will similarly write X <^ Y as a. shorthand for 
the fact that {X^ < Y^} is "almost true". 

Before we proceed, we state the following useful result, where || • ||oo denotes 
the norm and || • ||q, denotes the best possible a-Holder constant. 

Lemma 4.10 Let / : [0, 1] — R be continuously differentiable and let a G (0, 1]. 
Then, the bound 

\\dtf\U = ll/lli < 4||/|Umax{l, } 

holds, where \\f\\a denotes the best a-Holder constant for f. 

Proof. Denote by xq a point such that \dtfixo)\ = \\dtf\\oo- It follows from the 
definition of the a-Holder constant ll^t/Hc^ that \dtfix)\ > ^||9(/||oo for every x 

such that \x — xq\ < (||9f/||oo/2||9j/||c")^'^"- The claim then follows from the 
fact that if / is continuously differentiable and \dtfix)\ > A over an interval /, 
then there exists a point xi in the interval such that |/(3;i)| > A|/|/2. □ 

With these notations at hand, we have the following statement, which is es- 
sentially a quantitative version of the Doob-Meyer decomposition theorem. Orig- 
inally, it appeared in [Nor86], although some form of it was already present in 
earlier works. The statement and proof given here are slightly different from those 
in [Nor86], but ai^e very close to them in spirit. 

Lemma 4.11 Let W be an m-dimensional Wiener process and let A and B be 
R and W^-valued adapted processes such that, for a = |, one has E(||A||q + 
||S||cj)^ < CO for every p. Let Z be the process defined by 

Zt = Zo+ [ Asds+ [ Bs dW(s) . (4.5) 
Jo Jo 

Then, there exists a universal constant r G (0,1) such that one has 

{\\Z\\^<e} {\\A\\o. < e'} k {\\B\\oo < e'} . 

Proof. Recall the exponential martingale inequality [RY99, p. 153], stating that if 
M is any continuous martingale with quadratic variation process (M)(t), then 

pfsup \M{t)\ >x & {M){T) <y) < 2exp{-x^/2y) , 



Application to Diffusion Processes 



16 



for every positive T, x, y. With our notations, this implies that for any g < 1 and 
any adapted process F, one has the almost implication 



l^lloo <e} 



Ft dW{t) 



< e' 



With this bound in mind, we apply Ito's formula to Z^, so that 



Z^ + 2 



Z,Asds + 2 / Zs Bs dW(s) + / Bt ds . 



(4.6) 



(4.7) 



Since || A||oo <e £ (or any other negative exponent for that matter) by assump- 
tion and similarly for B, it follows from this and (4.6) that 



l^lloo <4 



{ir 



Zo ds 



3 

< £4 



B, Zs dW{s) 



< e 



Inserting these bounds back into (4.7) and applying Jensen's inequality then yields 



\Z\ 



<e} [J^ B^ds<ei^ |^ \Bs\ds<e^. 



We now use the fact that ||i?||a <e e ''for every q > and we apply Lemma 4.10 
with dtfit) = \Bt \ (we actually do it component by component), so that 



l^lloo <e} 



\B\ 



say. In order to get the bound on A, note that we can again apply the exponential 
martingale inequality to obtain that this "almost implies" the martingale part in 
(4.5) is "almost bounded" in the supremum norm by ew , so that 



<e} 



ds 







< £18 



Finally applying again Lemma 4.10 with dtf(t) = At,we obtain that 

{||Z|U<e} {P|U<e'/'°}, 
and the claim follows with r = 1/80. 



□ 



Remark 4.12 By making a arbitrarily close to 1/2, keeping track of the different 
norms appearing in the above argument, and then bootstrapping the argument, it is 
possible to show that 



<£} 



\A\\oo < eP} k {\\B\\^ < e'l} , 



for p arbitrarily close to 1/5 and q arbitrarily close to 3/ 10. This seems to be a very 
small improvement over the exponent 1/8 that was originally obtained in [Nor86], 
but is certainly not optimal either. The main reason why our result is suboptimal is 
that we move several times back and forth between L^, L?, and L°° norms. (Note 
furthermore that our result is not really comparable to that in [Nor86], since Norris 
used noiTns in the statements and his assumptions were slightly different from 
ours.) 



Application to Diffusion Processes 



17 



We now have all the necessary tools to prove Theorem 4.8: 

Proof of Theorem 4.8. We fix some initial condition xq G R*^ and some unit vector 
r] G R". With the notation introduced earlier, our aim is then to show that 

{{7],Co,irj) <e} 4>, (4.8) 

or in other words that the statement (?], Co.i??) < e is "almost false". As a short- 
hand, we introduce for an arbitrary smooth vector field F on R" the process Zp 
defined by 

Zpit) = (77, J^}F{xt)) , 

so that 

(r?, Co,i??) = V / \ZvM^dt>y^[ \ZvMdt) • (4.9) 
k=iJ^ fc=i -^0 ^ 

The processes Zp have the nice property that they solve the stochastic differential 
equation 

m 

dZpit) = Z^F,v,]it) dt + J2 Z[F,v,](t) o dWkit) , (4.10) 

i=l 

which can be rewritten in Ito form as 

m ^ m 

dZpit) = {Z[F,Vo}it) + 2^[[^'^fc]'^fc](*0 + XI Z[F,v,mdWk{t) . (4.11) 

k=l 1=1 

Since we assumed that all derivatives of the Vj grow at most polynomially, we de- 
duce from the Holder regularity of Brownian motion that, provided that the deriva- 
tives of F grow at most polynomially fast, Zp does indeed satisfy the assumptions 
on its Holder norm required for the application of Norris's lemma. The idea now 
is to observe that, by (4.9), the left hand side of (4.8) states that Zp is "small" for 
every F G Vq. One then argues that, by Norris's lemma, if Zp is small for every 
-F G V/fc then, by considering (4.10), it follows that Zp is also small for every 
F G Vfc-|_i. Hormander's condition then ensures that a contradiction arises at some 
stage, since Zp{G) = {F{xo), ^) and there exists k such that Vfc(xo) spans all of 
R". 

Let us make this rigorous. It follows from Norris's lemma and (4.1 1) that one 
has the almost implication 

{||Z,.||oo < e} {||^[F,y,]||oo < e'} k {\\Zg\\oo < e'} , 

for /c = 1, . . . , m and for G = [F, Vq] + i Ylk=l^^^^ ^k], Vkl Iterating this bound 
a second time, this time considering the equation for Zg, we obtain that 

2 

{II^fIIoo < {||^[[F,Vfe],V«]||oo <£"'}, 
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so that we finally obtain the implication 

{\\ZF\\oo<e} {\\Z[Fy Joo , (4.12) 

for A; = 0, . . . , m. 

At this stage, we are basically done. Indeed, combining (4.9) with Lemma 4. 10 
as above, we see that 

{{v,Co^iV) <e} {||^yJ|oo<e'/'}. 

Applying (4.12) iteratively, we see that for every k > Q there exists some > G 
such that 

{{r^,Co,ir,) < e} f| {||Zy |U < e-^^ • 

Since ZyiO) = {rj, V{xq)) and since there exists some k > such that Yki^o) = 
R", the right hand side of this expression is empty for some sufficiently large value 
of k, which is precisely the desired result. □ 
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